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This paper deals with a nonparametric shape respecting estima- 
tion method for U-shaped or unimodal functions. A general upper 
bound for the nonasymptotic Li-risk of the estimator is given. The 
method is applied to the shape respecting estimation of several clas- 
sical functions, among them typical intensity functions encountered 
in the reliability field. In each case, we derive from our upper bound 
the spatially adaptive property of our estimator with respect to the 
Li-metric: it approximately behaves as the best variable binwidth 
histogram of the function under estimation. 

1. Introduction. In this paper we study a data-driven nonparametric 
estimation method for shape restricted functions. As an application of this 
study, we first have in mind classical frameworks such as estimation of uni- 
modal densities or regression functions. We also place stress on building and 
studying shape respecting estimators of typical intensity functions, namely 
the hazard rate of an absolutely continuous distribution and the failure rate 
of a nonhomogeneous Poisson process, which are key concerns in systems re- 
liability studies: for a nonreparable system, which is replaced by a new one 
after it fails, the failure behavior is modeled by the distribution of its single 
lifetime, frequently specified via its hazard rate. For reparable systems, re- 
paired but not replaced after each failure, the failure behavior in time can 
be modeled by a counting process. When repair times can be disregarded 
and the system has a large number of units, this counting process can be 
approximated by a nonhomogeneous Poisson process. Such a process is to- 
tally characterized by its cumulative intensity function or, when it exists, 
by its failure rate. 
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Nonparametric estimation procedures have often been investigated first 
for density estimation and regression and then generalized to other frame- 
works. The more widely used are smoothing or projection methods with 
fixed parameters (see [24] and [9] for densities and [25] for regression func- 
tions). Several estimators of this type have been proposed and studied for 
the hazard rate, under censored and uncensored schemes (see [28, 30, 33]). 
In other respects, Curioni [11] studies histograms and kernel estimators of 
the failure rate of a nonhomogeneous Poisson process, based on the observa- 
tion of replications of the process. Even if the difficult problem of the choice 
of the smoothing parameter can be overcome by automatic methods such 
as cross-validation (see [26] for density estimation, [19] and [10] in reliabil- 
ity), the main handicap of those methods lies in their rigidity: they tend to 
assume that the unknown function has homogeneous variation everywhere. 
In other words, they are not sensitive enough to the local concentration 
of the data. Such a drawback clearly appears in the problem of estimating 
the hazard rate by fixed bandwidth kernel estimators: the local variance of 
those estimators tends to increase towards infinity as the number of sys- 
tems at risk decreases. Obviously, these methods are totally misleading for 
estimating the failure rate of a nonhomogeneous Poisson process in realistic 
situations, where one generally observes a small number of replications of the 
failure time process on a finite time period. Indeed, the system's condition 
at time t depends on its whole history before t so that the situation is truly 
nonasymptotic. One therefore needs estimation methods flexible enough to 
balance the lack of information collected by fitting the data as well as pos- 
sible, making a locally sensitive choice of the parameter. For that purpose, 
variable bandwidth kernel estimators and variable binwidth histograms have 
been studied, first for densities by Stone [29], and then for the hazard rate 
by Miiller and Wang [21, 22]. Bartoszyhski, Brown, McBride and Thompson 
[3] propose a variable bandwidth kernel estimator of the failure rate, based 
on the observation of replications of the process. The choice of the local 
bandwidth is generally done by the minimization of an asymptotic mean 
square error estimator. 

Another way of building adaptive tools is to look for the nonparamet- 
ric maximum likelihood estimate over a restricted class of functions, under 
which the likelihood is to be maximized. Contrary to kernel estimators, the 
construction of these estimators does not require either a smoothing param- 
eter or any smoothness assumption on the unknown function and only relies 
on very natural shape restrictions. Brunk [8] proposes the isotonic estimator 
for monotone regression functions and Durot [14] studies its good asymptotic 
properties related to the Li-metric. Such estimators for decreasing hazard 
rates have been put forward by Barlow, Bartholomew, Bremner and Brunk 
[1] in complete life data models. Similarly, Bartoszyhski, Brown, McBride 
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and Thompson [3] and Barlow, Proschan and Sheuer [2] propose the non- 
parametric maximum likelihood estimator for decreasing failure rates. The 
shape restriction in the last two cases is very natural since it corresponds to 
the observation of a system during its debugging period. For a decreasing 
density, the nonparametric maximum likelihood estimate is known as the 
Grenander estimator [15]. It has a very simple graphical meaning since it is 
the slope of the least concave majorant of the empirical distribution function 
based on a sample generated by the density under estimation. It takes the 
form of a variable binwidth histogram, generating a partition which is ap- 
proximately the best one in the Li-metric sense. This property is checked by 
Birge [5, 6] from a nonasymptotic minimax risk point of view and Groene- 
boom [16] and Groeneboom, Hooghiemstra and Lopuhaa [17] study its good 
asymptotic Li-properties. The construction of the Grenander estimator and 
its properties can straightforwardly be extended to the case of a unimodal 
density with known mode. Nevertheless, a more realistic assumption is that 
the mode is unknown. Actually, the nonparametric maximum likelihood es- 
timator does not exist any more on such a wide class. One can solve the 
problem (see [31, 32]) by finding a prior estimate of the mode, but the re- 
sulting properties rely on the choice of this estimate. In several studies, Birge 
[4, 7] proposes a totally data-driven estimation method for unimodal densi- 
ties with unknown mode: his estimator relies neither on the arbitrary choice 
of extra parameters nor on any smoothness assumption on the unknown 
density. It still approximately behaves as the best histogram in terms of the 
nonasymptotic minimax Li-risk, over restricted sets of unimodal densities. 

Our purpose in this paper is to extend Birge's method to a more general 
functional estimation framework. More precisely, we have in mind to define 
and study estimators for positive integrable functions g, assumed to be uni- 
modal or U-shaped (decreasing then increasing). The unimodal assumption 
is often realistic for regression or density functions, while U-shaped hazard 
rate or failure rate functions correspond to the failure behavior of a system 
which is observed during its entire lifetime: after a debugging period where 
the number of failures tends to decrease, the latter is stable during the ex- 
ploitation period, and then turns out to deteriorate from aging. Starting 
from a step function estimator G of 67 = J^g(t)dt on I = [a,b], we define 
the shape respecting estimator g of g as the image of G through some de- 
terministic mapping. The definition of this mapping relies on a convenient 
adaptation of the "Pool Adjacent Violators Algorithm" (see [1]) which is 
involved in the definition of Grenander estimator. 

This paper is organized as follows: in Section 2, we define and study 
this mapping in a deterministic framework. The former study is applied 
in Section 3 to a statistical framework: we build a general upper bound 
for the Li-risk of the shape respecting estimator of U-shaped or unimodal 
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functions and investigate conditions under which g behaves as a "clever" 
histogram, generating on its own a partition which is optimal from an Li- 
risk point of view. Section 4 is devoted to the application of our results 
to particular functions: we first study the shape respecting estimator of 
unimodal regression and density functions. We next build and study shape 
respecting estimators for a U-shaped hazard rate and a U-shaped failure 
rate in realistic underinformed designs. The proofs of our results are given 
in the three last sections. 

2. Deterministic framework. Our general aim in this paper is to esti- 
mate a shape restricted function g defined on a given compact real interval 
/ = [a, b]. In Section 3 we will show that we can define a shape respecting 
estimator of g as the image of a step function estimator of G = J^g(t)dt, 
through a deterministic mapping. This section is thus devoted to the con- 
struction and the study of such a mapping. More precisely, we are interested 
in mappings from the cone TC(I) of nondecr easing, right-hand continuous 
with left-hand limits (cadlag) step functions on / into particular sets of 
shape restricted, integrable functions on /. We focus here on the set of U- 
shaped functions and the set of unimodal functions on /, defined as follows: 

Definition 1 . A function g defined on an interval / = [a, b] is a U-shaped 
(resp. unimodal) function if there exists some number m in I such that g 
is nonincreasing (resp. nondecreasing) on [a,m] and nondecreasing (resp. 
nonincreasing) on [m, b]. 

For the sake of simplicity, we shall restrict ourselves to the U-shaped case. 
The unimodal case is briefly described in Remark 3. Moreover, we approach 
G by a nondecreasing function, which implies that g is positive, but this 
assumption may probably be dropped (see Remark 5). 

2.1. Construction of the mapping. Let m £ I be an arbitrary point. To 
be clearer, we first define a mapping U™ from TC(I) into WtffJ), where Ug L {I) 
is the set of U-shaped functions on / whose minimum is achieved at m (when 
m coincides with one of the endpoints of /, we get the important subsets of 
nonincreasing and nondecreasing functions). This mapping generalizes the 
isotonic mapping classically used in various contexts of functional estimation 
under monotonicity restrictions: when g is a decreasing function on / = [a, b] , 
it is defined as the slope of the least concave majorant of an approximation 
F of G and can be computed using the "Pool Adjacent Violators Algorithm" 
(PAVA), described in [1]. Formally, the mapping U™ is defined as follows: 

Definition 2. Let I = [a,b] be a compact real interval and let m be 
an arbitrary point in /. Let F G Ti.(I). We define U™(F) as the right-hand 
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continuous slope of F™, where F™ is defined on [a,m] as the least concave 
majorant of the restriction of F to [a,m] and is defined on [m,b] as the 
greatest convex minorant of the restriction of F to [m,6]. The function F^ 1 
is called the U-shaped regularization at m of F on I. 

Let us notice that F™ is a continuous, piecewise affine function on /. We 
now turn to the definition of our main mapping Us from 7i(I) into Us (I), 
where Us (I) = [j mGl U s n (I) is the set of all U-shaped functions on I. For 
this purpose, we use an idea introduced by Birge [4] in the context of the 
estimation of a unimodal density. It consists in minimizing on I the function 
ds defined by 

d s (m)=sup\F(t)-F s n (t)\. 
tei 

It is easy to see that ds is a cadlag step function on /, whose discontinuity 
points belong to the set of discontinuity points of F. This property gives a 
sense to the following: 

Definition 3. Let F e H(I) and let m(F) denote the midpoint of the 
interval where the function ds defined above achieves its minimum. We 
define the mapping U s from H(I) into U S {I) by U S (F) = U™ {F) (F). 

It is worth noticing that Us(F) is easily computable in practice. The de- 
termination of m(F) does not cause any trouble since we need to compare 
only a finite number of regularizations. Moreover, for any mE I, the regu- 
larization i 7 ™ can be computed via the PAVA: the algorithm is applied on 
[a, m] to compute the least concave majorant of the restriction of F to [a, m] , 
then on [m, b] to compute the greatest convex minorant of the restriction of 
F to [m,b]. 

2.2. An Li - approximation upper bound. We now investigate the Li-properties 
of our mapping. To fix ideas, let us take the point of view of approximation 
theory: let / = [a, b] be a compact real interval and let F 6 T~t(I) be an ap- 
proximation of some G = f'g(t)dt, where g is a positive U-shaped function 
on /. Then Us(F) is a shape respecting approximation of g. We seek to 
link the Li -approximation quality of g by Us(F) to the properties of the 
underlying error F — G. In order to square with our stochastic framework, 
we will have a more general approach: starting from some F of T~C(I), we 
want to control the Li-distance between Us(F) and any U-shaped function 
9- 

From now on, we will adopt the following: 
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Notation 1. (i) Let I = [a,b] denote a compact real interval, let / and 
g belong to Li(J) and let TL be a subset of Li(J). We set 

\\f-g\\= [\f(t)-g(t)\dt and d(f,H)= inf \\f - h\\. 
Ji heH 

(ii) Let 7r be a finite partition of / in intervals such that ir = ([^fc__i, ■ 
Here, a = t% < ■ ■ ■ < = b, 10 = {1, . . . , D 7 "} and D* is arbitrary. We de- 
note by n(J) the set of all such partitions. 

(iii) Let 7r G 11(7). We denote by 7i w the set of cadlag step functions based 
on 7r (we mean cadlag step functions that are constant on every 

With the above notation, we can state our main theorem (the proof is 
postponed to Section 5): 

Theorem 1. Let I = [a,b] be a compact real interval. Let g be a U- 
shaped function on L and let G = i^dit) dt. Let F G Tt(L) and let Us(F) be 
defined by Definition 3. Setting Z = F — G and f = Us(F), there exists some 
absolute constant C > 1 such that 

(1) \\f-g\\< inf Ud(g,H n ) + C ]T sup \Z(t) - Z{tl_ x )\\. 

" en ( 7 H fceJC*e[*£-i.*S] J 

(C = 49 works.) 
2.3. Comments. 

Remark 1. Let us denote by TZz{^) the term in brackets in (1). The 
quantity TZz{^) is the sum of a perturbation term d(g,7i 7r ), relying on the 
smoothness of g, and of a regularization term measuring the approximation 
error of G both by F and by a U-shaped regularization of F. Theorem 1 thus 
stresses that the step function Us(F) realizes over 11(7) the best compromise 
between those two terms. Let us notice that the perturbation term varies 
smoothly with tt, while the regularization term achieves its infimum over 
the subset of partitions whose endpoints belong to the set of discontinuity 
points of F. Thus, the best trade-off will be reached for such a partition. 

Remark 2. From an approximation theory point of view, such a tool, 
which is able to make a sensitive choice of the image set Ti^ among all par- 
titions of /, is much more powerful than classical projection operators map- 
ping onto linear sets generated by a uniform partition. Indeed, let a and it, 
respectively, denote a uniform and a nonuniform partition of / of cardinal- 
ity D. Let p a g and p n g be the orthogonal projections of some function g 
on 7i a and TC n , respectively. It is proved (see, e.g., [13]) that to achieve 
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an approximation error \\g — p a g\\ = 0(D~ a ), one needs to impose on g 
smoothness conditions of the type w(g) = 0(D~ a ) (where w is the continu- 
ity modulus of g), while for approximation by p n g, the conditions are less 
demanding: smoothness conditions on g are needed only in the larger space 
L 7 , 7 = (a + 1) . Typically, for g £ B" with p < 1, a = l/p — 1, the ap- 
proximation error is of order D~ a for p^g, while for p a g one has to impose 
the condition g G Bf 1 to achieve this rate. 

Remark 3. Let us briefly describe the unimodal case. Let F G TC n and 
let m£ I = [a, b] . Symmetrically, we can define the unimodal regularization 
F]y at m of F on / as follows: on [a, m] it is the greatest convex minorant of 
the restriction of F to [a, m] ; on [m, b] it is the least concave majorant of the 
restriction of F to [m, b]. We then define UJ^(F) as its right-hand continuous 
slope. As Fg 1 , the function F^ is piecewise affine. But while Fg 1 is always 
continuous, F™ happens to be discontinuous at the point m whenever F is 
discontinuous at this point. Next, in order to define the shape restricting 
mapping Un, we minimize on / the function dx defined by 

d N (m) = 8up\F(t)-Ftf(t)\. 
tei 

As shown by Birge [7], d^ is a continuous function on / and its minimum 
is achieved at a unique point m(F), which is a continuity point of F. We 
thus define the mapping Un from 7i(I) into the set of unimodal functions 

on / by Un(F) = U^ F \f). The practical construction of Un(F) is done 
by applying the symmetric procedure with the PAVA, after having found 
m(F). It may be computed numerically by using, for instance, a dichotomous 
algorithm, after having found the interval where the minimum of djy is 
achieved. Setting / = Un(F), (1) holds for every unimodal function g on /. 
The proof can be performed by exchanging the roles of the intervals [a, m] 
and [m, b) . 

Remark 4. The compactness assumption on / is made for sake of sim- 
plicity. Actually it is sufficient to restrict oneself to intervals / on which G 
is bounded. 

Remark 5. In this paper the underlying function F is assumed to be 
nondecreasing. This means that our approximation method is applied in 
practice to positive functions g. On the other hand (see, e.g., the applica- 
tion to the estimation of a regression function), one may wish to estimate 
a function g that is not positive. In such cases we will make the following 
conjecture: at the expense of subsidiary technical complications, the mono- 
tonicity restriction on F can be dropped. Checking this conjecture leads 
namely to show that the minimum of the function ds is still well defined. 
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3. Statistical framework. The previous results can be applied in a statis- 
tical context to build shape respecting estimators for U-shaped or unimodal 
functions g defined on a given interval I = [a,b\. Let X be a random vari- 
able whose law depends on g, where g is assumed to belong to Us{I) [resp. 
Un(I)]. Let G G T~t(I) be an estimator of G = J^g(t)dt based on the obser- 
vation X (e.g., X is a sample generated by an unknown unimodal density g 
and G is the empirical distribution function of the sample). We can apply 
the mappings previously defined on G to build a shape respecting estimator 
9 of g. 

The aim of this section is to study the nonasymptotic properties of this 
estimator. We first state in Theorem 2 a stochastic version of Theorem 1 
that gives control of the Li-risk of g. We next investigate some conditions 
of optimality of this control. 

In order to be clearer, we still restrict ourselves to the U-shaped case, al- 
though the results still hold in the unimodal case. Moreover, we use Notation 
1. As a straightforward consequence of Theorem 1, we get: 

Theorem 2. Let g be a U-shaped function on I = [a,b] and let X be a 
random variable whose law depends on g. Let G € H(I) be an estimator of 
G = f'g(t)dt based on X and let Us(G) be defined by Definition 3. Setting 
Z = G — G and g = Us(G), there exists some absolute constant C > 1 such 
that 

(2) E\\g-g\\< inf K z (n), 

where 

K z (ir) = 4d(g,H 7T ) + C £ e( sup \Z(t) - 
(C = 49 works.) 

Theorem 2 emphasizes the adaptive behavior of our tool: without ad- 
vanced knowledge of g, it makes a sensitive choice of the partition of H(I) 
that minimizes the quantity lZz{^)] this quantity takes the form of a risk 
(it is the sum of a bias term and of the expectation of a random error term) . 
As g is by construction a histogram based on a random partition, such a 
sensitive behavior leads us to wonder about its quality, compared to classical 
histogram estimators of g. 

Histogram estimators are common tools for estimating a density func- 
tion g defined on some interval /: given an arbitrary partition tt £ n(J), the 
histogram estimator of g based on tt is the empirical estimator of the orthog- 
onal projection of g on 7i n . Similarly, given a general function g and a step 
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function estimator G of G, we consider as an estimator of g the histogram 
g w defined by 

(3) rn) = E G{1 ® ,ts)(*) ft* ^1 * e /. 

We now investigate conditions on G under which g turns out to do at least 
as well as any variable binwidth histogram g n of g built from G. For that 
task we can show the following result (see Section 6 for the proof). 

Theorem 3. Let g 71 be defined by (3). Assume that the conditions of 
Theorem 2 hold. Assume moreover that there exists some positive constant 
A such that for all it E n(I) and all k E K7 , 



(4) E sup \Z(t) - Z{tl_ x )\ < AE{\Z{tl) - Z(tl^)\). 

V te [tj^, tjj] / 

Then we get for all it E n(/), 

(5) (CA + 8)- 1 ^(tt) < E||r - g\\ < K z (it). 
Moreover, 

(6) E\\g-g\\<(CA + 8) inf Ell^-dl. 

7ren(/) 

The quality of estimation appears to rely on the expected variations of the 
process Z = G — G on /: for sufficiently small ones, g generates on its own 
a partition which is optimal from an Li-risk point of view. In particular, it 
will do better than any variable binwidth histogram. Moreover, it is totally 
data-driven, which is a serious practical advantage since it allows one to solve 
the problem of how to check for the best partition. Further applications will 
show that the condition (4) often holds. 

Remark 6. Theorem 2 gives control of the Li-risk of the estimator of 
a U-shaped function g with unknown minimum point. Now, suppose that g 
is U-shaped and that the location of its minimum m is known. An obvious 
estimator g' m of g can be obtained via the basic mapping U™ of Definition 
2 as g m = U™{G) (when g is a decreasing density, it is merely a Grenander 
estimate). An upper bound for the Li-risk of this estimator is derived in 
Section 5.1 as a by-product of the proof of Theorem 1. Indeed, it follows 
from Lemma 1 that there exists some absolute constant C' such that 

n\g m -g\\ 

< inf Ud(g,H n ) + C' E E( sup \Z{t) - Z(tjU)l) }• 
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Hence, even if the point m is known, one does not lose much by assuming 
that it is unknown and (2) is still a good control for the Li-risk of g m , from 
a qualitative point of view. 

4. Applications. We now present examples of application of our study. 
We first consider the shape respecting estimation of two classical functions: 
that of a unimodal density, which allows one to recover Birge's results, and 
that of a unimodal regression function. For this task, we use the mapping U n 
defined in Remark 3. Our main focus is on the estimation of classical inten- 
sity functions used in reliability theory, in realistic designs: for nonreparable 
systems, we study the shape respecting estimator of a U-shaped hazard rate 
in right-censoring life data models. In the reparable system field, we study 
the shape respecting estimator of the U-shaped failure rate of a nonhomo- 
geneous Poisson process, based on the observation of a single process on a 
finite time period. Such a process is widespread in reparable systems studies, 
since it models the failure behavior of a system having a large number of 
units and whose repair times can be disregarded. It is totally characterized 
by its failure rate. In these two last applications, the U-shaped assumption 
is very natural and corresponds to the situation where a system is observed 
during its entire lifetime. Moreover, the adaptivity property is particularly 
important here since realistic reliability designs are often under informed. 

For each application, I = [a, b] is an interval and G 6 T~t(I) is an estimator 
of G = f'g(t)dt, where g is the function under estimation. Moreover, we 
adopt Notation 1 and the further: 

Notation 2. We denote by IId(J) the subset of 11(7) of partitions in 
D intervals. Next, for a given tt 6 11(7), we call g n the histogram estimator 
of g, defined by (3). 

Short proofs of the results presented in the sequel are postponed to Sec- 
tion 7. 

4.1. Estimation of a unimodal density. Let (Xi, . . . ,X n ) be a sample 
generated by an absolutely continuous distribution G with density g. Assume 
that the restriction of g to a given real interval I is unimodal (I can be the 
real line here, see Remark 4). The shape respecting estimator of g on I can 
be defined by g = Un(G), where 

1 n 

G{t) = -Y i l Xi <t for all tel. 
As an application of Theorems 2 and 3, we get: 
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Proposition 1. There exists an absolute positive constant E such that 

D 




(8) E\\g - g\\ < inf \ mf 4d(g,H n ) + E 

Moreover, there exists an absolute positive constant K such that 

(9) E\\g-g\\<K inf E\\g* - g\\. 

7ren(/) 

Inequality (8) is similar from a qualitative point of view to Birge's Theorem 1 
[7] and gives an idea of how the shape respecting estimator operates: it first 
chooses among the D-dimensional linear subsets of step functions the one 
which is closest to the unknown function g; it then checks the dimension D 
which realizes the best trade-off between the bias and the error terms of the 
estimation. Moreover, (9) shows that the selected partition is optimal from 
a nonasymptotic Li-risk point of view. This point had been investigated 
by Birge from a minimax point of view, since he showed that his estimator 
nearly achieves the minimax risk over the class of unimodal densities with 
bounded support. We get here a result for every unimodal density function. 

4.2. Estimation of a unimodal regression function. Another classical prob- 
lem is that of the estimation of a unimodal regression function. Let us con- 
sider here the model 

Yi = g(xi)+Ei, i = l,...,n, 

where Y{ is the observation at time Xi = i/n, the £j's are i.i.d. Gaussian 
centered errors with variance a 2 > and g is a unimodal function on / = 
[0, 1]. We can define the shape respecting estimator of g by g = Un{G), where 

1 - 

G(t) = -J2Yit Xi <t for all t £ [0, 1]. 

1=1 

Let us notice that G is not nondecreasing in general even if g is positive. 
We thus need to assume that the conjecture of Remark 5 holds in order to 
define and study g. 

Proposition 2. Assume that the conjecture of Remark 5 holds. Then 
there exists some positive constant E that depends only on a and M = 
su Pte[0,i] 9(t) such that (8) holds. Moreover, (9) holds for some absolute 
positive constant K. 

Remark 7. The assumption x% = i/n, i = 1, . . . ,n, can be dropped as 
soon as the Xj's are approximately uniformly spread in [0, 1]. Moreover, no 
particular assumption is required on the distribution family of the errors to 
get (8), and the common distribution has only to be bounded (not necessarily 
Gaussian) to get (9). 
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4.3. Estimation of a U-shaped hazard rate in right- censoring life data 
models. Suppose one observes n copies of a nonreparable system in a usual 
right-censoring scheme: let (T%, . . . ,T n ) be the potential lifetimes of the n 
copies, generated by an absolutely continuous distribution F with density 
/. Let (U\, . . . , U n ) be the sample of times of censure (this means that Uj 
is the time beyond which the jth copy can no longer be observed). Assume 
moreover that the C/j's are independent of the Tj's. The random variable Tj 
will be observed whenever Tj < Uj. The set of observable data is thus given 
by 

{Xj = Tj A Uj, 5j = l X] =T 3 ,j = 1, • ■ • ,n}. 

We want to estimate the hazard rate g = //(l — F) of the system's lifetime 
T on some compact real interval I = [0,c], under the assumption that g is 
U-shaped on /. A shape respecting estimator g = Us{G) of g on / can be 
derived from the Nelson-Aalen estimator [23] of the log-survival function G. 
It is defined on / by 

d= f dF{s) 
Jo l-F-(s)' 

Here F is the Kaplan-Meier product-limit estimator of F [18] and F~ is its 
left-hand continuous version. We get: 

Proposition 3. Assume that L(c) < 1, where L is the common distri- 
bution function of the Xj's. Then there exists an absolute constant C such 
that (8) holds with E = C(l — -F(c)) -1 / 2 . 

The sensitive behavior of our tool is a serious advantage here. Indeed, one 
of the typical drawbacks of hazard rate estimation methods is that the error 
term of the estimation depends on a penalization factor relying on the value 
of L near the right-hand side point c of the estimation interval, in such a way 
that for classical methods such as fixed bandwidth kernels or fixed binwidth 
histograms, the local error term near this point tends toward infinity. Using g 
minimizes the problem. Here the penalization factor is (1 — L(c))" 1 / 2 , but the 
partition chosen by the estimate will probably lead to a larger local binwidth 
when estimating near c than when estimating in the interior. Moreover, the 
automatic choice of the histogram binwidths is a serious practical advantage 
and its good behavior holds even for moderate sample sizes. For all these 
reasons, our estimate should be preferred — at least when the hazard rate is 
not known to be smooth — to data-adaptive bandwidth kernel estimators, for 
which the choice of the optimal local bandwidths relies on heavy procedures 
of asymptotic minimization of a mean square error estimate. 
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Remark 8. The definition and properties of the shape respecting es- 
timator of g in complete life data models can be straightforwardly derived 
from the former results, setting L = F. 

4.4. Estimation of the U-shaped failure rate of a nonhomogeneous Poisson 
process. Let (N(t))t>o be a nonhomogeneous Poisson process with mean 
function E(iV(t)) = G(t). Suppose that (N(t)) t >o describes the number of 
failures in time of a reparable system. The failure rate of the process is, 
when it exists, the derivative g of G. 

In the sequel, we propose to estimate g on a given finite time period 
I = [0,T], where it is known to be U-shaped. For this task, we observe the 
failure times (Ti, . . . , Tjv(t)) falling in / of a single copy of the system. Let 
us define on / g = Us(G) and 

N(T) 

G(t)= £ t Tk <t = N(t). 

k=l 

In order to describe the Li-behavior of g on /, we need to use a normal- 
ized Li-distance (which allows one to recover the Li-distance between two 
constant rates on I) . We thus define for all f,g€hi (I) 

\\f-g\\ = ^£\f(t)-g(t)\dt. 

We can show that: 

Proposition 4. There exists an absolute positive constant B such that 

(10) Mg-g\\ < inf I inf 4d(g,H w ) + B\ ^P-J%-\. 

Moreover, there exists an absolute positive constant K such that (9) holds. 

The main difference between failure rate estimation and the former stud- 
ied estimation functions is that the underlying observations (Ti , . . . , T N ^ ) 
are not i.i.d., except in the case where g is constant. Therefore, in the re- 
alistic context of the observation of a single system (or a small number of 
copies of the system), nonadaptive methods are totally misleading. Now 
let us investigate in some particular cases the quality of g with regard to 
classical estimators: in the case where g = A, (10) allows one to recover the 
parametric rate. Indeed, the parametric maximum likelihood estimate of g is 
given in this case by g = N(T)/T and satisfies E||p — A|| < \J A/T, while our 
bound gives the same, up to a multiplicative constant. (More generally, we 
find rates of order T -1 / 2 whenever g is a step function.) Moreover, when the 
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penalization factor \J G(T) /T involved in (10) is bounded, which includes 
nonincreasing and low nondecreasing functions g, the order of magnitude of 
the asymptotic risk will be at least T^ 1 / 3 as for densities. More generally, 
from a nonasymptotic point of view, we hope to obtain a good estimator as 
soon as the penalization factor is not too large. In some cases, this factor 
can be very important, in such a way that the quality of the estimate can 
be quite bad, for instance for high increasing rates. (Fortunately, we are 
not interested in such situations since they practically correspond to a sub- 
stantial deterioration of the system, which is retrieved as soon as possible 
from exploitation.) Nevertheless, even in unfavorable situations, the locally 
sensitive property of g will allow one to check break points in the slope of 
g. This fact is interesting for its own sake for trend studies. 

Remark 9. An adaptive estimator of g has been proposed by Barlow, 
Proschan and Scheuer [2] for decreasing failure rates. It is defined as the non- 
parametric maximum likelihood estimator over this shape restricted class. 
Although it has been used successfully in practice (it behaves reasonably 
well compared with classical parametric models used in industrial reliability 
studies), its properties have not been investigated so far. The shape respect- 
ing estimator we present and study in the sequel generalizes Barlow et al.'s 
estimator to U-shaped failure rates. Since a decreasing g can be seen as a 
degenerate U-shaped function, with minimum point at the end of the es- 
timation interval, it is worth noticing that our study also applies to their 
estimate (see Remark 6). 

5. Proof of Theorem 1. Let m be a point of I where g achieves its mini- 
mum. Let F m denote the U-shaped regularization of F at m (see Definition 
2) and set f m = Ug l (F). We shall prove the following two lemmas in Sections 
5.1 and 5.2, respectively (we use Notation 1). 

Lemma 1. There exists an absolute constant C > 1 such that for all 

vren(i), 

(11) \\r-g\\<Ad(g^) + C'Y. sup \Z(t) - Z(t k ^)\. 

k&K," 

Lemma 2. We have 

||r-/||<4sup|Z(t)|. 
tei 

For all 7T G n(I) and all k, we thus get 

k 

sup \Z(t)\<j2 sup \Z(t) - Z(tU)l 
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so that 

sup|Z(i)|<£ sup \Z(t)-Z{tl_ x )\. 

Moreover, 

ii/-9ii<ii/ m -5ii + ii/-rn 

and Theorem 1 follows from Lemma 1 and Lemma 2 with C = C + 4. 

5.1. Proof of Lemma 1. For the sake of simplicity, we omit the subscript 
7r in Notation 1 and we adopt the following extra: 

Notation 3. (i) For every subinterval J of I and all functions / and g 
in Li(J), we denote by 1{J) the length of J and we set 

M= imL 9m ' 

bg(J)= [ \g(t)-g(J)\dt and \\f - g\\(J) = [ \f(t) - g(t)\dt. 



(ii) Vfc€/C, weset Jfc = [ifc_i,ifc]. 

Note first that we may assume without loss of generality that there exists 
some j € K, such that m = tj. Indeed, assume that Lemma 1 holds over the 
restricted class of partitions of I including m, for some absolute constant C" . 
Then let 7r m be a partition of / with D + 1 endpoints t™ such that t™ = m, 
and let n be the partition of / with D endpoints tk such that tk = fu for all 
k < j and tk = t™ +1 for all k> j. For the term in j in (11), we get 



sup \Z(t)-Z(tf_ 1 )\+ sup \Z(t)-Z(tf)\ 

3 fc J+l 

< 2sup|Z(t) - Z(^_i)| + - Z^-.i) 



Moreover, 



<3sup|Z(t)-Z(tj_i; 
ter, 



Thus, we obtain (11) over the set of all partitions of / by adding constants 
(C' = 3C"). 

The main argument to show (11) is Proposition 1 of [5], which is the 
formal translation of the PAVA. Let us recall this result: 
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Lemma 3 (Birge). Suppose that we are given a nondecreasing integrable 
function h and a nonincreasing integrable function g on some finite interval 
J . Then, using Notation 3, 

\\h(J)-g\\(J)<\\h-g\\(J) 

(1) Let us first see what is happening on [a,m] in the case where a < m. 
Recall that tj =m. For k < j, define F k as the least concave majorant of the 
restriction of F on 4, Next, let us define Hq on [a,m] by Hq(iti) = F k {m) 
and 

H (t) = J2Fk(t)t ltk _ litk[ (t), 

and let /io be the right-hand continuous slope of Hq. Then Hq is a piecewise 
affine, continuous function such that on [a,m], F < Hq < F m . Moreover, ho 
is well defined on [a, m) and right-hand continuous. Its discontinuity points 
belong to X, where X = {xq, x\, . . . , x n } is the ordered set obtained as the 
union of the set {to, t±, . . . , tj} and that of the discontinuity points of F. 
Now, for I > 1, let us define Hi and hi by iterating the following rule (we 
apply the PAVA to JTj-i): 

(a) If hi-x is nonincreasing on [a,m), then we define hi = hi—i and Hi = 
Hi-t. 

(b) If hi-i is not nonincreasing on [a, m), then there exists some < i < n 
such that > hi-\(xi). Let us define 

i_ = min {k : hi-i(xk+i) > hi-i(x k )} 

0<k<n 

and 

i + = min {k:hi_ 1 (x k+1 ) < /i;_i(x fc )}, 

«_ <fe<n 

where by convention, inf = m. Then define Hi such that Hi = Hi_\ on 
[o,Xj_] U [xi + ,m] and ^ is affine on [xi_,Xi + ]. Since Hi is a piecewise affine, 
continuous function on [a, m], we can define its right-hand continuous slope 
h[. We thus get 

h = hi-\ on J= [a,Xi_) U [x i+ ,m), 

h = hi-x{J) on J= [xi_,x i+ ). 

For all I > 1, the function is piecewise affine, continuous and F < 
Hj-i < Hi < F m on [a,m]. The function /i/ is a cadlag step function on 
[a,m) with discontinuity points in X. Moreover, using Lemma 3, we get 
\\hi - g\\([a,m)) < - g\\([a,m)). Therefore, \\h t - g\\([a,m)) < \\h - 

g\\([a,m)). 
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Within a finite number of iterations, we get Hi = Hi_i and hi = hi-\. 
Let Iq + 1 be the first step where it happens. Then hi is a nonincreasing 
function, so Hi is concave. By definition of F m , we thus get Hi > F m and 
thus, Hi = F m on [a, to]. Therefore, hi = f m on [a,m). Finally, 

(12) \\f m - g\\([a,m)) <\\h - g\\([a,m)). 

(2) Let us now see what is happening on [to, b] in the case where to < 6. 
The same holds if we replace F by its left-hand continuous version F~ 
on (to, b], setting moreover F~(m) = F(m). Let us define Hq on [m,b] as 
the piecewise affine, continuous function such that on each I k , k > j, Hq 
is the greatest convex minorant of the restriction of F~ to I k . Let fto be 
its right-hand continuous slope. We iterate the symmetric rule: if ho is not 
nondecreasing, let [xi-,Xi+) be the first interval on which it happens. On 
this interval, we replace ho by its mean and Ho by an affine function. We 
iterate this rule until we obtain a nondecreasing function hi . We can check 
that hi = f m on [m,b) and that \\hi — g\\([m,b)) < \\ho — g\\([m,b)). Thus, 

(13) \\r-g\\([m,b))<\\ho-g\\([m,b)). 
By summation of (12) and (13), we get 

(14) ll/ m -<?ll< ll^o-5ll- 

Now, by a straightforward decomposition, we get on each 

(15) \\h - g\\(I k ) < bho{h) + |^o(/fc) - 9{h)Wk) + bg{I k ). 

(3) Let k <j; by definition of Hq, we get Ho(tk) = F(tk) and i?o(ifc-i) = 
F(tk-i)- Therefore, 

(16) \h (h) - g{h)\Kh) = \Z(t k ) - Z(t fc _i)| < sup \Z(t) - Z(t fc _i)|. 

te/ fe 

To compute the first term at the right-hand side of (15), we will use the 
following result (see Section 5.3 for the proof): 

Lemma 4. Let h be a nonincreasing function on J = [to,t\]. Let H = 
ft h(t)dt. Then, using Notation 3, 

bh(J) =2 sup {H(t) - H (t ) - (t-t )h(J)). 

teJ 

We apply Lemma 4 to the nonincreasing function ho on 1^. Since flo(ifc-i) = 
F(t k -i), Ho{t k ) = F(t k ) and Hq > F on I k , we get 

(17) bh (I k ) > 2snp( F(t) - F(t k -i) - ±^=±(F(t k ) - F(t k ^))). 

tei k \ / 
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But the restriction of Hq to I k is a function whose slope can only change 
at the discontinuity points of F where Hq hits F, so that the supremum in 
Lemma 4 is achieved at such points. Let us call this set Y. We get 

(18) bh (h) = 2sup(f(£) - F(t k ^) - t -^ J ^l(F(t k ) - F(t fc _i)) 
Equations (17) and (18) yield 

(19) bho(I k ) = 2 sup (V(i) - F(t k ^) - ^^i(F(* fe ) - F{t k -i))) . 

teh V l { 1 k) J 

A last decomposition of the right-hand side of (19), using the concavity of 
G, gives 

bho(I k ) < 4sup \Z(t) - Z(t fc _i)| + 2sup (G(t) - G(t fc _i) - (i - i fe -i)s(4))- 

Finally, applying Lemma 4 to 5, we get 

(20) &M4) < 4sup |Z(t) - Z(t fc _i)| + bg(I k ). 

teh 

Replacing (16) and (20) in (15) leads to 

(21) \\ho - g\\(I k ) < 5sup \Z(t) - Z(t fc _i)| + 2bg(I k ). 

teh 

(4) Let k > j; by definition of Hq, we get i?o(ifc-i) = F~(t k -i) and 
Ho(t k ) = F~(t k ). Setting Z~ = F~ — G, the second term on the right-hand 
side of (15) gives 

\Mh) - g(h)Wk) 
= \z-(t k )-z-(t k _ l )\ 

(22) 

< \Z~(t k ) - Z(t fc _i) + Z(t k -i) - Z(t fc _ 2 ) + Z(t fc _ 2 ) - z-(t fe _i)| 

<sup|Z(t)-Z(t fc _ 1 )|+2 sup |Z(t)-Z(t fc _ 2 )|. 
te/fe teh-i 

To compute the first term on the right-hand side of (15), we apply Lemma 
4 to the right-hand continuous nonincreasing function — ho on I k , 

bho{h) = 2sup(flb(tfc-i) - Ao(t) + (t - *fc-l)M4))- 

teh 

By construction, iJ (*ifc-l) = ^"(^fc-l), #o(4) = and F < F~ on J fc . 

On the other hand, the slope of Hq on I k can only change at the discontinuity 
points of F~ such that F~(t) = Ho(t). Thus, using the same scheme as for 
(19), we get 

bho(h) = 2 sup (f - + ^=l(F-(t fc ) - F-(^-i))). 
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Let 2/j+i] be the jth interval in I k on which F is continuous. Then 
bh (I k ) = 2sup sup ( F -(t k _ 1 )-F-(t) + t -^ 7 ^(F-(t k )-F-(t k _ 1 ))\. 



By continuity, the supremum on (yj,yj + i\ equals the supremum on the open 
interval (2/7, yj+i), on which F~ = F. Therefore, 



bho(Ik) = 2 sup sup 



2sup(F 



A straightforward decomposition using Lemma 4 yields 

bh (I k ) < 2sup \Z-(t k ^) - Z(t)\ + 2|Z-(t fe ) - Z-(i fe _i)| + 6 5 (4). 

tG/ fc 

Finally, 

(23) 6fc (4)<4sup|Z(t)-Z(Vi)| + 8 sup |Z(t) - Z(t k _ 2 )\ + 6 5 (/ fc ). 
tei* teik-i 

Replacing (22) and (23) in (15) leads to 
||fto - < 5sup |Z(t) - Z(t fc _i)| + 10 sup |Z(t) - Z(t fc _ 2 )| + 26 5 (/ fc ) 

for all > j. By (21) this inequality holds for all k £ /C, and by summation 
(24) 



15sup|Z(t)-Z(t Jfc _ 1 )|+2b g (/ fe ; 
te/ fc 



11^0-511 <E 

fce/c 

Now let p^g denote the L2-orthogonal projection of g on 7i n , that is, 
(25) P7T g(t) = g{h)t [tk _ 1>tk) {t) for all t E /. 



fce/c 



We get 



^bg{I k ) = \\png-g\\- 

fcG/C 



Setting h G 7^ and with p^/i its L2-orthogonal projection on TC n , we thus 
get p 7T h = h and then 

(26) \\ P7T g-g\\<2\\g-h\\. 
Therefore, 

(27) <M<7, 7^)- 

fcG/C 

Substituting (27) in (24) completes the proof of Lemma 1, since we have 
(14), and C = 45 works. 
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5.2. Proof of Lemma 2. The key arguments here are Marshall's lemma 
(see, e.g., [1]) and the following lemma (the proof of this lemma is omitted 
since it can be presented in the same way as that of Lemma 1 of [4]): 

Lemma 5. Let F £ Ti(I) and let F r and F s be the U-shaped regulariza- 
tions of F on I at r and s, respectively, with r < s. Let f r and f s be their 
right-hand continuous slopes. Then 

||/ r -/ s || = 2max{ sup (F(t) - F r (t)), sup (F s (t) - F(t))\. 

ir<t<s r<t<s ) 

Let m{F) be the point in / such that / = U™^ F '(F). By Lemma 5 we get 

||/ m -/|| < 2max{sup|F(i) - F m (t)\, sup \F(t) -F™^(t)\ 
I tei tei 

By definition of m(F) 

sup \F(t) - F m(F) (t)\ < sup \F(t) - F m (t)\, 
tei tei 

and therefore 

(28) \\f m - /II < 2sup \F(t) - G(t)\ + 2sup \G(t) - F m {t)\. 

tei tei 

For the last term on the right-hand side of (28), we have 



(29) 2sup|G(t) - F m {t)\ = 2max(sup \G(t) - F m (t)\, sup \G(t) - F m (t)\ }, 

tei U<m t>m 

so by Marshall's lemma, 

2sup|G(t) -F m (t)\ < 2max(sup|G(t) - F(t)\, sup \G(t) - F(t)\ 

(30) tei U<m t>m 

<2sup|F(t) -G(t)\. 
tei 

Substituting (30) in (28) leads to Lemma 2. 

5.3. Proof of Lemma 4. Let u be the function defined on J by u = 
h — h(J) and let U be defined on J by 

U(t) = [ u(x) dx = H(t) - H(to) - (t - to)h(J). 
Jt 

Then u(to) > and u(ti) < 0, so that there exists some c £ J where U 
achieves its maximum. Moreover, u is nonnegative before c and nonpositive 
after c. Since U{t\) = 0, we thus get 

r f£ fti 

bh(J)= / \u(t)\dt= / u(t)dt- / u(t) dt = 2 sup U(t), 
J J Jt Jc teJ 

which proves the lemma. 
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6. Proof of Theorem 3. In the sequel we adopt the convention and no- 
tation used in Section 5. 

Let 7T £ n(I) and let p n g be the L2-orthogonal projection of g on Tt n , 
defined by (25). To perform the right-hand side inequality of (5), we can 
write 



Iir-S|| = E / \9(t)-Pn9(t)+ P7r g(t)-9 n (t)\dt 
keK Jlk 

<E/ \9(t)-P*9(t)\dt+Y, [ \P*9(t)-9 w (t)\ 

keK Ik i-czv Jl h 



dt 



keK' 



<\\p*g-g\\ + Y,\Z(t k )-Z(t 



k-1, 



keK 



< 2d(g,H n ) + J2 SU P - ^(tfc-i)l- 

The last control of \\p n g — g\\ arises from (26). We get the result by a last 
obvious majorization (C > 1). Let us now prove the left-hand side inequality 
of (5). We get 



(31) 



g n -g\\>Yl ! (9(t)- P7 rg(t)+ P7r g(t)-r(t)) 
keK Jlk 

= E / {p*g{t)-r{t))dt 



dt 



= Y,\z{t k )-z{t k _ 1 )\. 

keK 

On the other hand, by the triangle inequality, 

\W - g\\ > \\g - p^gW - W - p^gW 

(32) 

= Wg-Pn-gW - 2J - z{t k -i)\. 

keK 

Multiplying (31) by (CA + 4) and (33) by 4, and summing the so-obtained 
inequalities, we get, since p n g £ 7i n , 

(CA + 8)||& - g\\ >4\\g- Pn g\\+CAj2 \Z(t k ) - Z(t k ^)\ 

keK 

> 4d(g,H n ) + CAJ2 \Z(t k ) - Z(t k ^)\. 

keK 

Therefore, taking the expectations, we get that when condition (4) holds, 
U z (ir) < 4d(g,H n ) + CA^ ®\Z(t k ) - Z(t k - X )\ 

keK 

<(CA + $)E\\%-g\\. 
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Relation (6) is straightforwardly derived from the last inequality and The- 
orem 2. 

7. Short proofs for Propositions 14. Let us set Z = G — G. For each 
proposition, the proof follows the same scheme: to show (8) [resp. (10)], one 
needs to control the error term TZz{^) in Theorem 2, for all tt £ n(I). For 
this purpose we fix a partition tt in Up (I), D <n (resp. D <T), and we 
use the same notation as in the former section, setting moreover G(Ifc) = 
G(tk) — G(tk-i). We then show that there exists some C such that 



D 

(33) ]Te( sup \Z (t) -Z(t k . 

k=l 

respectively 



\tei k 





i^Efsup|Z(t)-Z(t fe _ 1 )|)<2 

Therefore, we get the result applying Theorem 2 with E = CC' (resp. 
B = 2C). Next, to show (9), one needs to check condition (4) in Theorem 3, 
for all 7ren(J). 

Proof of Proposition 1. To prove (33), let us call F the common 
conditional distribution function of the Xj's given that Xi € Ik- Let N = 
J27=l ^-Xi£i k De the number of observations falling in 1^ and let F/v be the 
empirical distribution of N observations falling in If.. We get for all t £ Ik, 

F{t) = G{t) G^) k ~ l] ^ ^) = ^)-^-i))- 
Therefore, 

\tei h 

34 

N 



< e( sup— \F N (t)- F(t)\) +E 



n 



G{I k ) 



For the first term on the right-hand side of (34), an upper bound can be 
derived applying Massart's inequality [20] to Fon4, 

Pf sup\F N {t)-F{t)\ >X\N) <2e~ 2Nx2 VA>0. 
\tei k J 

Integrating the latter inequality leads to 
(35) Efsup — \F N (t) - F(t)\] < yfirfmf 



\tel k n J \ n 
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A last control of (35) can be performed by the Cauchy-Schwarz's inequality, 
leading to 



E U^\F N {t)-F{t)\)<^J^. 
\teik n J V 2 V n 

On the other hand, the second term on the right-hand side of (34) can be 
bounded by the Cauchy-Schwarz's inequality applied to N ~ B(n,G(Ik))- 
We thus obtain 



(36) E(jwp|Z(*)-Z(t fc _i)|) < + )y^- 



■ teh 

We then obtain (33), since 



D 



>G{I k 



fci V « " V n 

To prove (9), one needs to sharpen the bound (36): actually, it gives the 
right order of magnitude of the supremum on each Ik such that G(Ik) > 
but it is too crude when G{Ik) < 1/n. Both G and G are monotone, so that 
for all k, 



E su V \Z(t)-Z(t k ^)\ )<2G{I k ). 
\tei k 



Combining this inequality with (36) yields 



e(su P \Z{t) - Z(t k ^ < min|2G(4), (1 + V^/2)^j ^ |. 



On the other hand, by a lemma of Devroye and Gyorfi (see [12], page 25), 
(37) E|Z(t fc ) - Z(t fc _i)| > mini 0.1367(7*), 0.36a 



'G(I k 



n l 

so there exists an absolute constant A such that (4) holds. □ 

Proof of Proposition 2. Let H and B be the processes defined on 
[0, 1] by 77 = E(Z) and B = Z - H. 

The Si's are independent, so that applying the Cauchy-Schwarz inequal- 
ity, we get 

[nt fe ] 



(38) 



E(swp\B(t)-B(t k ^)\) <!e( 2 

fc \*=[ntfc_i]+l / 



< ^(Vn(*fe-tfc_i) + l). 
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On the other hand, g is unimodal, so one can prove that 

(39) suplHifi-Hfa-JlK—. 

tei k n 

Since 

D 



n 
1=1 ' 



relations (39) and (38) lead to (33), with C = 2(3M + a). 

To prove (9), let us only consider k such that [nt k ] 7^ [raifc-i] [otherwise 
(4) is trivial for all A > 0]. We use a Von Bahr-Esseen inequality (see, e.g., 
[27], page 858), which leads to 

e( sup \B(t) - B(t k -!)\) < SK\B(t k ) - 
\tei k / 

Using (39) and the preceding display, the triangle inequality gives 
(40) 8E(|Z(t fc ) - Z(t fc _i)|) > e( sup - Bfa-i)^ - 8—. 



teh 



n 



Using Markov's inequality, we get 



(41) E\Z(t k ) - Z{t k . x )\ > ^F(\Z(t k ) - Z{t k _ x )\ > ^) > 

The last inequality arises from the fact that Z(t k ) — Z{t k -i) is a centered 
Gaussian variable whose variance is greater than a 2 /n 2 . 

Now, multiplying (41) by 432Af / \/2tt and by summation with (40), there 
exists an A' such that 

Ef sup \B(t) - B(t k ^)\) < A'E\Z(t k ) - Z(t fc _i)|. 
\ teh / 

The triangle inequality and relations (39) and (41) lead to the fact that 
there exists an A such that condition (4) holds. □ 

Proof of Proposition 3. Let us set for alii e I g*(t) = &x (n) >tg{t) 
and G*(t) = J g* (s) ds, where -X7 n ) is the rath order statistic of the sample. 
Setting Z* = G-G*, we get 

£E(sup|Z(t)-(t fc _i)| 

fc=i ^* eJ * 

(42) d 

^ E E ( t s 2Pl Z *(*) - Z *(^-i)l) + E^J\ X{n)<s g{s)ds 
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The process (Z*(t))t>o is a square integrable mean zero martingale. Its pre- 
dictable variation process is given by (see, e.g., [27], Theorem 2, page 312) 

(43) (Z*)= [' ^ S -f n \ a(s)ds, 



o n(l-L-(t))' 

where L~ is the left-hand continuous version of the empirical distribution 
function of the AYs. Using Doob's inequality, relation (43) combined with 
the Cauchy-Schwarz inequality yields for all k, 

e( sup \Z*(t) - Z*{t k - X )\) < 2\/E((Z*)(t fc )-(Z*)(t fc _i)) 
\ teh J 



2 

< 

in 



_/ 1-Ls \ g(s) , 

E sup - \ ' , ds. 

\s<x (n) l-L-(s)Jl-L(s) 

Setting H as the common distribution function of the Ui 's [we get 1 — L = 
(1 — F){1 — iT)], we thus apply Gill's inequality (see [27]), which leads to 



if sup \z*(t) - z*(t k ^)\) < 

\tei k J v n V 1 ~ 



H(c) I/ 1 - F(t k ) 1 - F(t 



k-lj 



Finally, 



(44) Y1 E ( sap\Z*(t)-Z*(t k -x)\) < 



D 

k^Vteh'" w ~ v " w_i/ 7 - V n ^/l-Lfc)' 
For the last term on the right-hand side of (42), we use the relations 

nX {n )<s) = (l-(l-H(s))(l-F(s))) n and (l-^^e^ 

for all s <n. 

Simple calculations yield 

(45) ^j\ x ^ g( , )ds )<f- 7T ^. 
Combining (44) and (45) in (42) yields (33). □ 

Proof of Proposition 4. The process (Z(t)) t >o is a square inte- 
grable mean zero martingale. Since N(t k ) — N(t k -\) ~ V(G(I k )), applying 
the Doob and Cauchy-Schwarz inequalities thus leads to 



E sup\Z(t)-Z(t k ^)\ )<2y/G(I k ), 
\tei k 
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so that 

i^E(sup|Z(t)-Z(t fe _ 1 )|)<2 
This proves (33). 

To prove (9), let us set for all x > M(x) = N±(x) — x, where (Ni(x)) x >o 
is the Poisson process with mean function x. Recall that Xj = G(Tj) is the 
jth occurrence time of (iVi(a:))x>o- Then let us set J = [x k _i, x k ], where x k = 
G(t k ) and xt-i = G(tk-i) and let (aj)o<i<m be the sequence of endpoints 
of a uniform partition of J. Since N\ has independent increments, the m 
random variables M{ai) — M(aj_i) are integrable i.i.d. mean zero variables 
and we can apply a Von Bahr-Esseen inequality (see [27] , page 858) , 

e( max \M( ai ) - M(x k ^)\) < 8E\M(x k ) - M(x k ^)\. 

\l<i<m J 

Moreover, N\ is a cadlag process. Therefore, this inequality holds on the 
whole interval J, when the partition's step tends toward zero. We thus get 

e( sup|M(x) - M(x k ^)\) < 8E(|Af(x fc ) - M(x k ^)\) 

and then 

e( sup \Z(t) - Z(t k ^)\) < 8E(\Z(t k ) - Z(t k _x)\). □ 
\tei k J 
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